1 Review

  • Making maps in R is really powerful and easy
  • The sf package handles point and polygon spatial data
  • sf objects are dataframes with an additional geometry column
library(urbnmapr)
library(urbnthemes)
library(tidyverse)
library(sf)
library(tigris)
set_urbn_defaults(style = "map")

2 Reading in spatial data

Out in the wild, spatial data comes in many formats but some of the common ones you will encounter are:

  • CSV’s with lon/lat columns
  • GeoJSON’s (.geojson)
  • Shapefiles (.shp)

the sf package has a function called st_read() that makes reading in any of these types of files easy.

For GeoJSONs and Shapefiles:

data <- st_read("path/to/geojson/file.geojson")
data <- st_read("path/to/shp/file.shp")

For CSV’s:

# You have two options!
# 1- Read in the file using st_read, but maniupulate some GDAL options
data <- st_read("path/to/csv/file.csv",
  options = c("X_POSSIBLE_NAMES=lon", "Y_POSSIBLE_NAMES=lat")
)

# 2- Read in the file as a data frame and convert it to an sf object
data <- read_csv("path/to/csv/file.csv")
data <- st_as_sf(data, coords = c("lon", "lat"))

NOTE: X = Longitude & Y = Latitude. You will at some point mix these two up, but try to keep them straight.

st_read can also accept URL’s!

data <- st_read("https://opendata.arcgis.com/datasets/287eaa2ecbff4d699762bbc6795ffdca_9.geojson")
## Reading layer `287eaa2ecbff4d699762bbc6795ffdca_9' from data source `https://opendata.arcgis.com/datasets/287eaa2ecbff4d699762bbc6795ffdca_9.geojson' using driver `GeoJSON'
## Simple feature collection with 256 features and 51 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -77.11113 ymin: 38.81718 xmax: -76.91108 ymax: 38.98811
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

Notice that it figured out what filetype it was and what kind of spatial data it was.

2.1 Exercise 0: Create a Project

Step 1: Create a new directory called mapping

Step 2: Open RStudio. Click “Project: (None)” in the top right corner. Click “New Project” and create a project based on the existing mapping directory.

Step 3: Open a .R script with the button in the top left. Save the script as 01_intro-to-mapping.R.

Step 4: Submit install.packages("tidyverse") to the Console.

Step 5: Submit install.packages("devtools") to the Console.

Step 6: Submit remotes::install_github("UrbanInstitute/urbnmapr") to the Console.

Step 7: Submit remotes::install_github("UrbanInstitute/urbnthemes") to the Console.

Step 8: Write library(tidyverse), library(urbnmapr), and library(urbnthemes) at the top of 01_intro-to-mapping.R. With the cursor on the line of text, click Control-Enter.

2.2 Exercise 1: Read in a GeoJSON

Step 1: Read in data on the locations of all fire stations in DC. Create a variable named fire_stations and using st_read, read in the following GeoJSON from this URL: https://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Safety_WebMercator/MapServer/6/query?where=1%3D1&outFields=NAME,ADDRESS,TRUCK,AMBULANCE&outSR=4326&f=geojson

Step 2: Type in fire_stations into your console and press enter. What does this SF dataframe look like? F

2.3 Exercise 2: Read in a CSV

Step 1: Copy and paste the following link into your browser to download the CSV to your computer. It’s a dataset on all crimes in DC within the last 30 days. https://geocoding-codestar-test.s3.us-east-2.amazonaws.com/crime_last_30_days.csv

Step 1: Create a folder called data in your mapping101 folder, and move the downloaded CSV into that folder

Step 2: Find the filepath to the CSV (it should be something like ‘data/example.csv’) and read it into R as an sf dataframe named crime_30. Note: You may need to open the CSV to see what the X and Y columns are labelled as. You can use any of the two methods described above

3 Writing out spatial data

sf has a function called st_write that can write out sf objects as GeoJSON’s, shapefiles, CSVS, Geopackages, or just about any other spatial format supported by GDAL. We recommend saving your spatial files as GeoJSON’s whenever you can for for a few reasons:

  1. It’s a universal file format that can easily be read back into R or other programs like ArcGIS, or Python
  2. It uses standardized projections and naming conventions, which saves you a lot of headaches down the road
  3. It’s a lightweight file format, especially for small to medium sized spatial files

Writing out GeoJSONs and Shapefiles:

st_write(sf_data, "output/path/filename.geojson")
st_write(sf_data, "output/path/filename.shp")

Writing out CSV’s

There may be some cases when you need to write out your spatial data as a CSV to share with other folks. sf does let you do that, but the lat/lon columns will either be labelled X/Y or be a single WKT column.

X Y
-76.9651 38.89204
WKT
POINT (-76.9651, 38.89204)

Here’s how you write out CSV’s both ways using sf_write:

# Writing out X and Y columsn to CSV
st_write(sf_data, "C:/Users/anarayanan/Downloads/exparkslocations.csv",
  layer_options = "GEOMETRY=AS_XY"
)

# Writing out WKT columns to CSV
st_write(sf_data, "C:/Users/anarayanan/Downloads/exparkslocations.csv",
  layer_options = "GEOMETRY=AS_WKT"
)

But sometimes, we want to write out CSVs with human readable Latitude and Longitude columns instead of X and Y columns as they are easy to mix up (quick - is X latitiude or longitude?).SF doesn’t give you the ability to do this easily. So we need to extract the coordinates from the SF dataframe, append them as columns to our dataframe, then rename them latitide/longitude.

# getting lon/lat columns from dataframe
coords = st_coordinates(data) %>% 
  as_tibble() %>% 
  dplyr::rename(lon = X, lat = Y)

# appending lon/lat columns to dataframe
data = data %>% 
  st_set_geometry(NULL) %>% 
  bind_cols(coords)

# writing out dataframe as csv
write_csv(data, "path/to/file.csv")

3.1 Exercise 4: Writing out Spatial data

Step 1: Let’s say we were only interested in relatively large fire stations. Use the filter() function to limit the fire_stations dataframe to stations that have more than 5 trucks (ie TRUCKS > 5). Call this filtered dataframe big_stations

Step 2: Write out the big_stations dataframe as a GeoJSON into the data\ folder we created earlier using st_write(). Call the file big_stations.geojson.

Step 3: Write out the big_stations dataframe as a CSV with lat/lon columns into the data\ folder we created earlier using st_write(). Call the file big_stations.csv.

4 Where can I find spatial data?

The tigris package in R is a great place to start looking. tigris provides spatial data for:

  • Census Tracts, Blocks, & Block Groups
  • Counties, ZCTA’s, PUMA’s, Places
  • congressinal districts, School Districts
  • roads, railways, native areas, military bases
  • Lots of other data sources!

It provides powerful functionality by allowing you to filter to specific states, counties, or tracts. Say for example I wanted data on all block groups in DC

dc <- tigris::block_groups(
  state = "DC",
  year = 2017,
  class = "sf",
  progress_bar = FALSE
)

ggplot(dc) +
  geom_sf()

There’s also a sister package called tidycensus that provides easy access to census data in addition to spatial data. Say for example you wanted data on the population counts in Montgomery County census tracts from from the 1 year ACS:

# Note you need to sign up for a free Census API key here:
# http://api.census.gov/data/key_signup.html

api_key = '12345678'
x = get_acs("tract", year = 2016, endyear = 2016, 
            state = "MD", county = "Montgomery", 
            geometry = TRUE, key = api_key)

Below are some other common places to find spatial data

4.1 Exercise 5: Using tigris

Step 1: Use the school_districts() function in tigris to download all the school districts in Oregon from the year 2015. Call the variable or_school_districts. Make sure to set class = "sf"!

Step 2: Use ggplot() and geom_sf() to make a plot of all the school districts in Oregon

4.2 Exercise 6: What are projections?

Step 1: Download the zipped shapefile of Toronto’s zoning boundaries from here: ftp://webftp.vancouver.ca/OpenData/shape/zoning_districts_shp.zip

Step 2: Unzip the contents of the shapefile into the data folder of the mapping101 directory.

Step 3: Find the filepath to the zoning_districts.shp file in the data folder and read in the shapefile into R. Name the object tor_zones.

Step 3: Type tor_zones into the column and take a look at the geometry column. What’s wrong with those points?

5 Coordinate Reference Systems

Whenever you create a map you have to make assumptions about 1) the exact 3d shape of the earth and 2) how to project that 3d shape onto a 2d surface. That’s where Coordinate Reference Systems (CRS) come in! There are two kinds of Coordinate Reference Systems:

  1. Geographic coordinate systems:
  • Are a 3d representation of the earth
  • Uses spheroid/ellpisoid surface to approximate shape of the earth
  • Usually use decimal degree units (ie latitude/longitude) to identify locations on earth

  1. Projected coordinate systems
  • Are 2d representations of the earth
  • Is a particular geographic coordinate system + a particular projection
    • projection: mathematical formula used to convert a 3d coordinate system to a 2d flat coordinate system
    • Many different kinds of projections, inluding Equal Area, Equidistant, Conformal
    • All projections distort the true shape of the earth in some way. Required xkcd comic
  • Usually use linear units (ie feet, meters)

5.1 Finding the CRS

In sf there’s a function called st_crs(). Let’s use that on our fire_stations dataframe

st_crs(fire_stations)
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"